Introduction: Why Code for Data Science?

Phil Chodrow

Tuesday, August 27th, 2019

What is Data Science?

Some Things That Aren’t Data Science

The Cloud\(^{\mathrm{TM}}\)

Deep Learning

Pickup Lines Generated By Janelle Shane’s Neural Network

  • You look like a thing and I love you.
  • I have to give you a book, because you’re the only thing in your eyes.
  • You are so beautiful that you know what I mean.

Link: https://aiweirdness.com

BIG DATA!!1!!

Data Science Is:

  • Gathering data that matters.
  • Asking questions that matter about your data.
  • Choosing appropriate methods to answer those questions.
  • Implementing solutions that meets stakeholder needs.

Data Science Tools

You Can Do Data Science With:

  • A pencil and paper
  • A calculator
  • Excel
  • R, Julia, Python….

Why Not Excel?

Why Code?

Why R for Data Analysis?

  • R is the best language in the world for learning data science.
  • R is one of the best languages in the world for doing data science.
  • R tends to be preferred in academia and among “statisticians,” while python is more popular among “computer scientists” and “data scientists”
  • Most practicing data scientists know and use both.

Why Julia and JuMP for Optimization?

EMMA PLEASE FILL IN

…yes, there will be an opportunity to learn Python later in the semester.

Learning Goals

What can you pick up in two days?

  • You are not going to become an R or Julia expert in two days.
  • But…
  • You will know the basic concepts and vocabulary of data science – enough to employ the most important skill of all.

The most important skill of all…

The most important skill of all…

Gameplan

  1. Today: Version Control, Basic Data Analysis and Visualization in R
  2. Tomorrow: Optimization in Julia and JuMP, presenting work.
  3. Both days: mini-project, partner work, lots of exercises.

Exercise 0

  1. Look left.
  2. Look right.
  3. Pick a partner (groups of 3 are fine).
  4. Give them a professional, yet friendly smile.
  5. You are going to need them soon.